Support for AMD 3DNOW instruction set

The PAVGUSB instruction produces the rounded averages of the eight unsigned 8-bit integer values in the source operand (an MMX register or a 64-bit memory location) and the eight corresponding unsigned 8-bit integer values in the destination operand (an MMX register). It does so by adding the source and destination byte values and then adding a 001h to the 9-bit intermediate value. The intermediate value is then divided by 2 (shifted right one place) and the eight unsigned 8-bit results are stored in the MMX register specified as the destination operand. The PAVGUSB instruction can be used for pixel averaging in MPEG-2 motion compensation and video scaling operations.

Numerical Range for the PF2ID Instruction

PF2ID

void _stdcall _pf2id(_mmxdata *array1,_mmxdata *array2,int n);

PF2ID is a vector instruction that converts a vector register containing single-precision, floating-point operands to 32-bit signed integers using truncation. The table below shows the numerical range of the PF2ID instruction. The PF2ID instruction performs the following operations:

IF (mmreg2/mem64[31:0] >= 2)

THEN mmreg1[31:0] = 7FFF_FFFFh

ELSEIF (mmreg2/mem64[31:0] <= –2)

THEN mmreg1[31:0] = 8000_0000h

ELSE mmreg1[31:0] = int(mmreg2/mem64[31:0])

IF (mmreg2/mem64[63:32] >= 2)

THEN mmreg1[63:32] = 7FFF_FFFFh

ELSEIF (mmreg2/mem64[63:32] <= –2)

THEN mmreg1[63:32] = 8000_0000h

ELSE mmreg1[63:32] = int(mmreg2/mem64[63:32])

Source 2	Source 1 and destination
0	0
Normal, abs(Source 1) <1	0
Normal, –2147483648 < Source 1 <= –1	round to zero (Source 1)
Normal, 1 <= Source 1< 2147483648	round to zero (Source 1)
Normal, Source 1 >= 2147483648	7FFF_FFFFh
Normal, Source 1 <= –2147483648	8000_0000h

PFACC

void _stdcall _pfacc(_mmxdata *array1,_mmxdata *array2,int n);

PFACC is a vector instruction that accumulates the two words of the destination operand and the source operand and stores the results in the low and high words of destination operand respectively. Both operands are single-precision, floating-point operands with 24-bit significands.

The PFACC instruction performs the following operations:

mmreg1[31:0] = mmreg1[31:0] + mmreg1[63:32]

mmreg1[63:32] = mmreg2/mem64[31:0] + mmreg2/mem64[63:32]

PFADD

void _stdcall _pfadd(_mmxdata *array1,_mmxdata *array2,int n);

PFADD is a vector instruction that performs addition of the destination operand and the source operand. Both operands are single-precision, floating-point operands with 24-bit significands.

The PFADD instruction performs the following operations:

mmreg1[31:0] = mmreg1[31:0] + mmreg2/mem64[31:0]

mmreg1[63:32] = mmreg1[63:32] + mmreg2/mem64[63:32]

PFCMPEQ

void _stdcall _pfcmpeq(_mmxdata *array1,_mmxdata *array2,int n);

PFCMPEQ is a vector instruction that performs a comparison of the destination operand and the source operand and generates all one bits or all zero bits based on the result of the corresponding comparison.

The PFCMPEQ instruction performs the following operations:

IF (mmreg1[31:0] = mmreg2/mem64[31:0])

THEN mmreg1[31:0] = FFFF_FFFFh

ELSE mmreg1[31:0] = 0000_0000h

IF (mmreg1[63:32] = mmreg2/mem64[63:32]

THEN mmreg1[63:32] = FFFF_FFFFh

ELSE mmreg1[63:32] = 0000_0000h

PFCMPGE

void _stdcall _pfcmpge(_mmxdata *array1,_mmxdata *array2,int n);

PFCMPGE is a vector instruction that performs a comparison of the destination operand and the source operand and generates all one bits or all zero bits based on the result of the corresponding comparison.

The PFCMPGE instruction performs the following operations:

IF (mmreg1[31:0] >= mmreg2/mem64[31:0])

THEN mmreg1[31:0] = FFFF_FFFFh

ELSE mmreg1[31:0] = 0000_0000h

IF (mmreg1[63:32] >= mmreg2/mem64[63:32]

THEN mmreg1[63:32] = FFFF_FFFFh

ELSE mmreg1[63:32] = 0000_0000h

PFCMPGT

void _stdcall _pfcmpgt(_mmxdata *array1,_mmxdata *array2,int n);

PFCMPGT is a vector instruction that performs a comparison of the destination operand and the source operand and generates all one bits or all zero bits based on the result of the corresponding comparison.

The PFCMPGT instruction performs the following operations:

IF (mmreg1[31:0] > mmreg2/mem64[31:0])

THEN mmreg1[31:0] = FFFF_FFFFh

ELSE mmreg1[31:0] = 0000_0000h

IF (mmreg1[63:32] > mmreg2/mem64[63:32]

THEN mmreg1[63:32] = FFFF_FFFFh

ELSE mmreg1[63:32] = 0000_0000h

PFMAX

void _stdcall _pfmax(_mmxdata *array1,_mmxdata *array2,int n);

PFMAX is a vector instruction that returns the larger of the two single-precision, floating-point operands. Any operation with a zero and a negative number returns positive zero. An operation consisting of two zeros returns positive zero.

The PFMAX instruction performs the following operations:

IF (mmreg1[31:0] > mmreg2/mem64[31:0])

THEN mmreg1[31:0] = mmreg1[31:0]

ELSE mmreg1[31:0] = mmreg2/mem64[31:0]

IF (mmreg1[63:32] > mmreg2/mem64[63:32])

THEN mmreg1[63:32] = mmreg1[63:32]

ELSE mmreg1[63:32] = mmreg2/mem64[63:32]

PFMIN

void _stdcall _pfmin(_mmxdata *array1,_mmxdata *array2,int n);

PFMIN is a vector instruction that returns the smaller of the two single-precision, floating-point operands. Any operation with a zero and a positive number returns positive zero. An operation consisting of two zeros returns positive zero.

The PFMIN instruction performs the following operations:

IF (mmreg1[31:0] < mmreg2/mem64[31:0])

THEN mmreg1[31:0] = mmreg1[31:0]

ELSE mmreg1[31:0] = mmreg2/mem64[31:0]

IF (mmreg1[63:32] < mmreg2/mem64[63:32])

THEN mmreg1[63:32] = mmreg1[63:32]

ELSE mmreg1[63:32] = mmreg2/mem64[63:32]

PFMUL

void _stdcall _pfmul(_mmxdata *array1,_mmxdata *array2,int n);

PFMUL is a vector instruction that performs multiplication of the destination operand and the source operand. Both operands are single-precision, floating-point operands with 24-bit significands.

The PFMUL instruction performs the following operations:

mmreg1[31:0] = mmreg1[31:0] * mmreg2/mem64[31:0]

mmreg1[63:32] = mmreg1[63:32] * mmreg2/mem64[63:32]

PFRCP

void _stdcall _pfrcp(_mmxdata *array1,_mmxdata *array2,int n);

PFRCP is a scalar instruction that returns a low-precision estimate of the reciprocal of the source operand. The single result value is duplicated in both high and low halves of this instruction’s 64-bit result. The source operand is single-precision with a 24-bit significand, and the result is accurate to 14 bits. Increased accuracy (the full 24 bits of a single-precision significand) requires the use of two additional instructions (PFRCPIT1 and PFRCPIT2). The first stage of this increase or refinement in accuracy (PFRCPIT1) requires that the input and output of the already executed PFRCP instruction be used as input to the PFRCPIT1 instruction.

The PFRCP instruction performs the following operations:

mmreg1[31:0] = reciprocal(mmreg2/mem64[31:0])

mmreg1[63:32] = reciprocal(mmreg2/mem64[31:0])

PFRCPIT1

void _stdcall _pfrcpit1(_mmxdata *array1,_mmxdata *array2,int n);

PFRCPIT1 is a vector instruction that performs the first step in a Newton-Raphson iteration to refine the reciprocal approximation produced by the PFRCP instruction (the second and final step yields a result accurate to 24 bits). The behavior of this instruction is only defined for those combinations of operands such that one source operand was the input to the PFRCP instruction and the other source operand was the output of the same PFRCP instruction.

PFRCPIT2

void _stdcall _pfrcpit2(_mmxdata *array1,_mmxdata *array2,int n);

PFRCPIT2 is a vector instruction that performs the second and final step in a Newton-Raphson iteration to refine the reciprocal or reciprocal square root approximation produced by the PFRCP and PFSQRT instructions, respectively.

The behavior of this instruction is only defined for those combinations of operands such that the first source operand (mmreg1) was the output of either the PFRCPIT1 or PFRSQIT1 instructions and the second source operand (mmreg2/mem64) was the output of either the PFRCP or PFRSQRT instructions.

PFRSQRT

void _stdcall _pfrsqrt(_mmxdata *array1,_mmxdata *array2,int n);

PFRSQRT is a scalar instruction that returns a low-precision estimate of the reciprocal square root of the source operand. The single result value is duplicated in both high and low halves of this instruction’s 64-bit result. The source operand is single-precision with a 24-bit significand, and the result is accurate to 15 bits. Negative operands are treated as positive operands for purposes of reciprocal square root computation, with the sign of the result the same as the sign of the source operand. Increased accuracy (the full 24 bits of a single-precision significand) requires the use of two additional instructions (PFRSQIT1 and PFRCPIT2). The first stage of this increase or refinement in accuracy (PFRSQIT1) requires that the input and squared output of the already executed PFRSQRT instruction be used as input to the PFRSQIT1 instruction.

PFSUB

void _stdcall _pfsub(_mmxdata *array1,_mmxdata *array2,int n);

PFSUB is a vector instruction that performs subtraction of the source operand from the destination operand. Both operands are single-precision, floating-point operands with 24-bit significands.

The PFSUB instruction performs the following operations:

mmreg1[31:0] = mmreg1[31:0] – mmreg2/mem64[31:0]

mmreg1[63:32] = mmreg1[63:32] – mmreg2/mem64[63:32]

PFSUBR

void _stdcall _pfsubr(_mmxdata *array1,_mmxdata *array2,int n);

PFSUBR is a vector instruction that performs subtraction of the destination operand from the source operand. Both operands are single-precision, floating-point operands with 24-bit significands.

The PFSUBR instruction performs the following operations:

mmreg1[31:0] = mmreg2/mem64[31:0] – mmreg1[31:0]

mmreg1[63:32] = mmreg2/mem64[63:32] – mmreg1[63:32]

PFI2FD

void _stdcall _pfi2fd(_mmxdata *array1,_mmxdata *array2,int n);

PI2FD is a vector instruction that converts a vector register containing signed, 32-bit integers to single-precision, floating-point operands. When PI2FD converts an input operand with more significant digits than are available in the output, the output is truncated.

The PI2FD instruction performs the following operations:

mmreg1[31:0] = float(mmreg2/mem64[31:0])

mmreg1[63:32] = float(mmreg2/mem64[63:32])

PFMULHRW

void _stdcall _pfmulhrw(_mmxdata *array1,_mmxdata *array2,int n);

The PMULHRW instruction multiplies the four signed 16-bit integer values in the source operand (an MMX register or a 64-bit memory location) by the four corresponding signed 16-bit integer values in the destination operand (an MMX register). The PMULHRW instruction then adds 8000h to the lower 16 bits of the 32-bit result, which results in the rounding of the high-order, 16-bit result. The high-order 16 bits of the result (including the sign bit) are stored in the destination operand.

The PMULHRW instruction provides a numerically more accurate result than the PMULMH instruction, which truncates the result instead of rounding.

Example of a 3DNOW program in C

This example shows a complete example of the usage of this instructions.

#include <stdio.h>

// Always include the mmx header!

#include <mmx.h>

//***********************************************

// Calculate the squares of 8 floating point numbers stored in an

// mmx data vector. Each member of the array contains 2 floats.

//***********************************************

int main(void)

{

_mmxdata data[4];

int i;

// Fill the array

for (i=0; i<4;i++) {

data[i].Floats.high = (float)i*2;

data[i].Floats.low = (float)(i*2+1);

}

// Execute the multiplication

_pfmul(data,data,4);

// Always finish the MMX state before calling any external

// function like printf

_emms();

// Display the results

for (i=0; i<4; i++) {

printf("%d %f\t",i*2,data[i].Floats.high);

printf("%d %f\n",1+i*2,data[i].Floats.low);

}

return 0;

}

The output of this program is:

0 0.000000 1 1.000000

2 4.000000 3 9.000000

4 16.000000 5 25.000000

6 36.000000 7 49.000000